Improving Spam Blacklisting Through Dynamic Thresholding and Speculative Aggregation
نویسندگان
چکیده
Unsolicited bulk e-mail (UBE) or spam constitutes a significant fraction of all e-mail connection attempts and routinely frustrates users, consumes resources, and serves as an infection vector for malicious software. In an effort to scalably and effectively reduce the impact of these e-mails, e-mail system designers have increasingly turned to blacklisting. Blacklisting (blackholing, block listing) is a form of course-grained, reputation-based, dynamic policy enforcement in which real-time feeds of spam sending hosts are sent to networks so that the e-mail from these hosts may be rejected. Unfortunately, current spam blacklist services are highly inaccurate and exhibit both false positives and significant false negatives. In this paper, we explore the root causes of blacklist inaccuracy and show that the trend toward stealthier spam exacerbates the existing tension between false positives and false negatives when assigning spamming IP reputation. We argue that to relieve this tension, global aggregation and reputation assignment should be replaced with local aggregation and reputation assignment, utilizing preexisting global spam collection, with the addition of local usage, policy, and reachability information. We propose two specific techniques based on this premise, dynamic thresholding and speculative aggregation, whose goal is to improve the accuracy of blacklist generation. We evaluate the performance and accuracy of these solutions in the context of our own deployment consisting of 2.5 million production e-mails and 14 million e-mails from spamtraps deployed in 11 domains over a month-long period. We show that the proposed approaches significantly improve the false positive and false negative rates when compared to existing approaches.
منابع مشابه
Identifying New Spam Domains by Hosting IPs: Improving Domain Blacklisting
This paper studies the possibility of using hosting IP addresses to identify potential spam domains. Current domain blacklisting may not be effective if spammers keep replacing blacklisted domains with newly registered domains. In this study, we cluster spam domains based on their hosting IP addresses and associated email subjects. We found some hosting IP addresses were heavily used by spammer...
متن کاملEmpirically Characterizing Domain Abuse and the Revenue Impact of Blacklisting
Using ground truth sales data for over 40K unlicensed prescription pharmaceuticals sites, we present an economic analysis of two aspects of domain abuse in the online counterfeit drug market. First, we characterize the nature of domains abused by affiliate spammers to monetize what is evidently an overwhelming demand for these drugs. We found that the most successful affiliates are agile in ada...
متن کاملProposal Title : Email Spam Detection using a Multi - Objective Memetic Algorithm
IP Address blacklisting [1, 2] works by storing a list of the origin of known spam and then ignoring further email sent from that IP address, under the assumption it is also spam. This approach has two problems: spammers are able to circumvent it by regularly switching IP addresses, and after spam has been sent from an IP address hijacked by a spammer, email sent by the unsuspecting computer us...
متن کاملTwitter Content-Based Spam Filtering
Twitter has become one of the most used social networks. And, as happens with every popular media, it is prone to misuse. In this context, spam in Twitter has emerged in the last years, becoming an important problem for the users. In the last years, several approaches have appeared that are able to determine whether an user is a spammer or not. However, these blacklisting systems cannot filter ...
متن کاملA Survey of Content-based Spam Classifiers
Unsolicited bulk e-mail (spam) is a growing problem with tangible costs felt by virtually every Internet user. There are many solutions to this problem, ranging from simple blacklisting to advanced text classification and collaborative filtering. None of these techniques provides a total solution, but new technologies and their application offer increasingly effective filters. This paper provid...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010